System Design Interview Preparation 2025
π― Strategy to Crack 80-90% System Design Interviewsβ
This comprehensive guide covers Low-Level Design (LLD) and High-Level Design (HLD) topics that appear in 80-90% of system design interviews at FAANG and top tech companies. Master these patterns to excel in both junior and senior engineering roles.
π Coverage Overviewβ
| Category | Topics | Priority | Time to Master |
|---|---|---|---|
| LLD Fundamentals | 8 | π΄ Critical | 2 weeks |
| LLD Design Problems | 15 | π΄ Critical | 3 weeks |
| Design Patterns | 12 | π‘ High | 2 weeks |
| HLD Fundamentals | 10 | π΄ Critical | 2 weeks |
| HLD Design Problems | 20 | π΄ Critical | 4 weeks |
| System Components | 15 | π‘ High | 2 weeks |
| Scalability Patterns | 10 | π΄ Critical | 1 week |
| Databases & Storage | 8 | π΄ Critical | 1.5 weeks |
Total Preparation Time: 12-16 weeks with consistent practice (2-3 hours/day)
ποΈ LOW-LEVEL DESIGN (LLD)β
Understanding LLDβ
What is LLD?
- Object-oriented design of individual components
- Class diagrams, relationships, and interactions
- Code-level implementation focus
- SOLID principles and design patterns
When is LLD Asked?
- Junior to Mid-level (SDE-1, SDE-2)
- First rounds of interviews
- Machine coding rounds
- Some senior roles for specific companies
1οΈβ£ LLD Fundamentals (8 Topics) π΄β
Must Masterβ
1. Object-Oriented Programming Principlesβ
Key Concepts:
- Encapsulation
- Abstraction
- Inheritance
- Polymorphism
Interview Focus:
- When to use inheritance vs composition
- Abstract classes vs interfaces
- Access modifiers and their impact
Common Questions:
- "Explain polymorphism with a real-world example"
- "Why is composition preferred over inheritance?"
- "How does encapsulation improve code maintainability?"
2. SOLID Principles π₯π₯π₯β
Most Important for Interviews:
S - Single Responsibility Principle
- A class should have only one reason to change
- Example: Separate UserService from EmailService
O - Open/Closed Principle
- Open for extension, closed for modification
- Use interfaces and abstract classes
L - Liskov Substitution Principle
- Subtypes must be substitutable for base types
- Important for inheritance hierarchies
I - Interface Segregation
- Many specific interfaces better than one general
- Don't force clients to depend on unused methods
D - Dependency Inversion
- Depend on abstractions, not concretions
- Use dependency injection
Interview Tips:
- Always mention SOLID when discussing design
- Give examples from previous projects
- Show how it improves testability
3. UML Diagramsβ
Must Know:
- Class diagrams (relationships, multiplicity)
- Sequence diagrams (interaction flows)
- Use case diagrams (system boundaries)
Key Relationships:
- Association (has-a)
- Aggregation (weak has-a)
- Composition (strong has-a)
- Inheritance (is-a)
- Dependency (uses-a)
Tools:
- Draw.io
- Lucidchart
- PlantUML (for code-to-diagram)
4. Class Relationshipsβ
Association: Teacher ββ Student (bidirectional)
Aggregation: Department ββ Employee (weak ownership)
Composition: House ββ Room (strong ownership)
Inheritance: Dog βββ· Animal (is-a relationship)
Dependency: OrderService β€ EmailService (uses)
Interview Questions:
- "What's the difference between aggregation and composition?"
- "When would you use composition over inheritance?"
5. Design Principlesβ
DRY (Don't Repeat Yourself)
- Extract common code into reusable components
- Use inheritance or composition
KISS (Keep It Simple, Stupid)
- Simplest solution that works
- Avoid over-engineering
YAGNI (You Aren't Gonna Need It)
- Don't add functionality until needed
- Avoid premature optimization
Law of Demeter
- Only talk to immediate friends
- Minimize coupling
6. Exception Handling & Error Managementβ
Best Practices:
- Use specific exceptions
- Don't catch generic exceptions
- Clean up resources (try-with-resources)
- Log appropriately
Interview Focus:
- Checked vs unchecked exceptions
- When to create custom exceptions
- Error propagation strategies
7. Concurrency & Thread Safetyβ
Key Topics:
- Synchronization
- Race conditions
- Deadlocks
- Thread-safe collections
- Immutability
Common Patterns:
- Singleton with thread safety
- Producer-Consumer pattern
- Thread pools
8. Testing & Testabilityβ
Principles:
- Write testable code
- Use dependency injection
- Mock external dependencies
- Unit tests vs integration tests
Interview Questions:
- "How do you make your code testable?"
- "What's the difference between mocking and stubbing?"
2οΈβ£ LLD Design Problems (15 Problems) π΄β
Category A: Object-Oriented Design (Must-Do)β
1. Parking Lot System π₯π₯π₯β
Difficulty: Medium | Frequency: Very High
Requirements:
- Multiple floors with parking spots
- Different vehicle types (car, truck, motorcycle, electric)
- Different spot types (compact, large, handicapped, electric)
- Entry/exit with ticket
- Pricing strategy
- Find available spots
- Spot reservation
Key Classes:
ParkingLot, Floor, ParkingSpot, Vehicle
Ticket, Payment, PricingStrategy
VehicleType (enum), SpotType (enum)
Important Concepts:
- Strategy pattern (pricing)
- Factory pattern (vehicle/spot creation)
- Singleton (ParkingLot)
- Observer pattern (availability notifications)
Interview Focus:
- How to handle concurrent requests?
- How to find nearest available spot?
- Database schema design
- Extend for electric vehicle charging
Common Follow-ups:
- "How would you handle peak hours?"
- "Design a reservation system"
- "Add a payment gateway"
- "Handle handicapped spot priority"
2. Library Management System π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Add/remove books
- Search books (title, author, ISBN)
- Issue/return books
- Multiple copies of same book
- Member management
- Late fee calculation
- Reservation system
Key Classes:
Library, Book, BookItem, Member
Librarian, Catalog, Search
Lending, Reservation, Fine
Important Concepts:
- Strategy pattern (search strategies)
- Observer pattern (availability notifications)
- State pattern (book states: available, issued, reserved)
Interview Focus:
- How to handle multiple copies?
- Search optimization
- Late fee calculation
- Extend for ebooks
3. Hotel Management System π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Room booking
- Different room types
- Search available rooms
- Booking cancellation
- Guest management
- Housekeeping management
- Room service
Key Classes:
Hotel, Room, RoomType, Booking
Guest, Receptionist, Housekeeper
RoomService, Payment
Important Concepts:
- State pattern (room states)
- Factory pattern (room creation)
- Strategy pattern (pricing)
- Observer pattern (housekeeping alerts)
Interview Focus:
- Handling concurrent bookings
- Overbooking strategy
- Dynamic pricing
- Integration with payment systems
4. Elevator System π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Requirements:
- Multiple elevators
- Up/down buttons on each floor
- Destination buttons inside elevator
- Optimal elevator selection
- Emergency stop
- Weight limit
- Door open/close
Key Classes:
ElevatorSystem, Elevator, Floor
Button, Request, Direction (enum)
ElevatorController, Scheduler
Important Concepts:
- Strategy pattern (scheduling algorithm)
- State pattern (elevator states)
- Command pattern (requests)
- Observer pattern (floor updates)
Scheduling Algorithms:
- FCFS (First Come First Serve)
- SCAN (elevator algorithm)
- LOOK algorithm
- Destination dispatch
Interview Focus:
- Optimal scheduling algorithm
- Handle multiple requests
- Emergency scenarios
- Energy optimization
Common Follow-ups:
- "How would you optimize for peak hours?"
- "Design for high-rise buildings"
- "Add priority for emergency services"
5. ATM System π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Cash withdrawal
- Balance inquiry
- PIN verification
- Cash deposit
- Mini statement
- Card reader
- Cash dispenser
Key Classes:
ATM, Card, Account, Bank
Transaction, CashDispenser
CardReader, Screen, Keypad
Important Concepts:
- State pattern (ATM states)
- Chain of Responsibility (cash dispensing)
- Proxy pattern (bank connection)
- Command pattern (transactions)
Interview Focus:
- Security considerations
- Handling insufficient cash
- Network failures
- Concurrent withdrawals
6. Online Shopping System (E-commerce) π₯π₯π₯β
Difficulty: Medium-Hard | Frequency: Very High
Requirements:
- Product catalog
- Shopping cart
- Order management
- Payment processing
- Inventory management
- User accounts
- Search and filter
- Notifications
Key Classes:
Product, Category, ShoppingCart
Order, OrderItem, Payment
User, Seller, Admin
Inventory, Notification
Important Concepts:
- Strategy pattern (payment, shipping)
- Observer pattern (inventory, notifications)
- Factory pattern (product types)
- Decorator pattern (product customization)
Interview Focus:
- Handling cart abandonment
- Inventory synchronization
- Concurrent purchases
- Payment gateway integration
7. Car Rental System π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Search available vehicles
- Reserve vehicles
- Rental process
- Return process
- Calculate charges
- Late fees
- Vehicle maintenance
- Multiple locations
Key Classes:
Vehicle, Reservation, Branch
Customer, RentalTransaction
VehicleType, Insurance
Important Concepts:
- State pattern (vehicle states)
- Strategy pattern (pricing)
- Factory pattern (vehicle types)
Interview Focus:
- Handling overlapping reservations
- Dynamic pricing
- Maintenance scheduling
- Multi-location management
8. Movie Ticket Booking System π₯π₯π₯β
Difficulty: Medium | Frequency: Very High
Requirements:
- List movies and showtimes
- Select seats
- Book tickets
- Payment processing
- Cancellation
- Multiple cinema halls
- Different pricing (weekday/weekend)
- Food ordering
Key Classes:
Movie, Show, Theater, Hall
Seat, Booking, Payment
Customer, Admin
Important Concepts:
- Strategy pattern (pricing)
- State pattern (seat states)
- Observer pattern (seat availability)
- Factory pattern (ticket types)
Interview Focus:
- Concurrent seat booking (locking mechanism)
- Seat selection UI/UX
- Cancellation policy
- Dynamic pricing
Common Follow-ups:
- "How to handle seat blocking during booking?"
- "Design for multiple cinema chains"
- "Add recommendation system"
Category B: Design Patterns Implementation (Important)β
9. Vending Machine π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Select product
- Insert money (coins/notes)
- Dispense product
- Return change
- Handle insufficient money
- Product inventory
Key Classes:
VendingMachine, Product, Inventory
State (Idle, HasMoney, Dispensing)
Coin, Note
Important Concepts:
- State pattern (machine states) π₯
- Strategy pattern (payment)
- Singleton (machine instance)
States:
- Idle
- HasMoney
- Dispensing
- OutOfStock
Interview Focus:
- State transitions
- Change calculation
- Concurrent access
- Inventory management
10. Chess Game π₯π₯β
Difficulty: Hard | Frequency: Medium
Requirements:
- Valid moves for each piece
- Check and checkmate detection
- Castling, en passant
- Pawn promotion
- Game state management
- Move history
Key Classes:
Game, Board, Square, Piece
Player, Move, GameState
King, Queen, Rook, Bishop, Knight, Pawn
Important Concepts:
- Strategy pattern (piece moves)
- Command pattern (moves)
- Memento pattern (undo)
- State pattern (game states)
Interview Focus:
- Valid move calculation
- Check detection algorithm
- AI opponent (optional)
11. Snake & Ladder Game π₯β
Difficulty: Easy-Medium | Frequency: Medium
Requirements:
- Board with 100 cells
- Snakes and ladders
- Multiple players
- Dice roll
- Win condition
- Game state
Key Classes:
Game, Board, Player, Dice
Snake, Ladder, Cell
Important Concepts:
- Strategy pattern (dice roll)
- Observer pattern (player position updates)
12. Notification Service π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Multiple channels (Email, SMS, Push)
- Priority levels
- Retry mechanism
- Template management
- User preferences
- Delivery status
Key Classes:
Notification, NotificationService
EmailChannel, SMSChannel, PushChannel
Template, UserPreference
DeliveryStatus
Important Concepts:
- Strategy pattern (channels)
- Observer pattern (status updates)
- Factory pattern (channel creation)
- Template method (notification sending)
- Chain of Responsibility (retry logic)
Interview Focus:
- Handle failures gracefully
- Rate limiting
- User preference management
- Scale to millions of notifications
Category C: Real-World Applications (Nice to Have)β
13. Logging Framework π₯π₯β
Difficulty: Medium | Frequency: Medium
Requirements:
- Multiple log levels (DEBUG, INFO, WARN, ERROR)
- Multiple output targets (console, file, database)
- Log formatting
- Log rotation
- Configuration
- Async logging
Key Classes:
Logger, LogLevel, LogAppender
ConsoleAppender, FileAppender
LogFormatter, Configuration
Important Concepts:
- Singleton (Logger instance)
- Strategy pattern (appenders)
- Builder pattern (log configuration)
- Chain of Responsibility (log levels)
- Observer pattern (multiple appenders)
14. Cache System (LRU Cache) π₯π₯π₯β
Difficulty: Medium | Frequency: Very High
Requirements:
- Get and Put in O(1)
- Evict least recently used
- Capacity limit
- Thread safety (optional)
- TTL support (optional)
Key Classes:
Cache, CacheEntry
DoublyLinkedList, HashMap
EvictionPolicy
Important Concepts:
- Strategy pattern (eviction policies)
- Singleton (cache instance)
Eviction Policies:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- FIFO (First In First Out)
- Random
Interview Focus:
- HashMap + Doubly Linked List implementation
- Thread safety with ReadWriteLock
- Generics for type safety
- Memory management
15. Meeting Scheduler π₯π₯β
Difficulty: Medium | Frequency: High
Requirements:
- Check availability
- Book meeting rooms
- Invite participants
- Handle conflicts
- Recurring meetings
- Cancellation
Key Classes:
MeetingRoom, Meeting, Participant
Calendar, TimeSlot, Booking
Scheduler
Important Concepts:
- Strategy pattern (conflict resolution)
- Observer pattern (participant notifications)
- Factory pattern (meeting types)
Interview Focus:
- Interval overlap detection
- Optimal room allocation
- Handle time zones
- Recurring meetings logic
3οΈβ£ Design Patterns (12 Patterns) π‘β
Creational Patternsβ
1. Singleton Pattern π₯π₯π₯β
Use Cases: Database connection, Logger, Configuration manager
Thread-Safe Implementation:
public class Singleton {
private static volatile Singleton instance;
private Singleton() {}
public static Singleton getInstance() {
if (instance == null) {
synchronized (Singleton.class) {
if (instance == null) {
instance = new Singleton();
}
}
}
return instance;
}
}
Interview Questions:
- Why double-checked locking?
- Why volatile keyword?
- Bill Pugh Singleton (Inner class)
2. Factory Pattern π₯π₯β
Use Cases: Creating objects without specifying exact class
When to Use:
- Vehicle creation (Car, Truck, Motorcycle)
- Payment method (Credit, Debit, UPI)
- Notification channel (Email, SMS, Push)
3. Abstract Factory Pattern π₯β
Use Cases: Creating families of related objects
Example: UI components for different OS (Windows, Mac, Linux)
4. Builder Pattern π₯π₯β
Use Cases: Complex object construction
Example: Building a complex query, HTTP request, Pizza order
When to Use:
- Many constructor parameters
- Optional parameters
- Immutable objects
5. Prototype Pattern π₯β
Use Cases: Cloning objects instead of creating new
Example: Document templates, Game characters
Structural Patternsβ
6. Adapter Pattern π₯π₯β
Use Cases: Making incompatible interfaces work together
Example:
- Legacy system integration
- Third-party library integration
- XML to JSON converter
7. Decorator Pattern π₯π₯β
Use Cases: Adding behavior dynamically
Example:
- Pizza toppings (base + cheese + olives)
- Coffee add-ons (coffee + milk + sugar)
- Stream decorators (BufferedInputStream)
8. Proxy Pattern π₯β
Use Cases: Controlling access to objects
Types:
- Virtual Proxy (lazy loading)
- Protection Proxy (access control)
- Remote Proxy (remote objects)
Example: Image lazy loading, Database connection pooling
Behavioral Patternsβ
9. Strategy Pattern π₯π₯π₯β
Use Cases: Selecting algorithm at runtime
Examples:
- Payment methods (Credit, Debit, UPI, Wallet)
- Sorting strategies (QuickSort, MergeSort)
- Pricing strategies (Regular, Holiday, Member)
- Compression algorithms (ZIP, RAR, 7Z)
Most Important for Interviews!
10. Observer Pattern π₯π₯π₯β
Use Cases: One-to-many dependency
Examples:
- Event listeners
- Stock price updates
- Notification system
- MVC architecture
Implementation: Subject and Observer interfaces
11. State Pattern π₯π₯β
Use Cases: Object behavior changes with state
Examples:
- Vending machine states
- Order states (Pending, Processing, Shipped, Delivered)
- Traffic light states
- Connection states
12. Command Pattern π₯β
Use Cases: Encapsulating requests as objects
Examples:
- Undo/Redo functionality
- Task scheduling
- Remote control operations
ποΈ HIGH-LEVEL DESIGN (HLD)β
Understanding HLDβ
What is HLD?
- System architecture at a high level
- Component interactions
- Scalability and reliability
- Trade-offs and constraints
When is HLD Asked?
- Mid to Senior level (SDE-2, SDE-3, Staff)
- Final rounds of interviews
- Architect roles
- Leadership positions
4οΈβ£ HLD Fundamentals (10 Topics) π΄β
1. System Design Framework (RESHADED) π₯π₯π₯β
R - Requirements (Functional & Non-Functional)
- What does the system do?
- Who are the users?
- Scale expectations?
E - Estimations (Back-of-envelope)
- QPS (Queries Per Second)
- Storage requirements
- Bandwidth
- Memory
S - System Interface (API Design)
- REST endpoints
- Parameters and responses
- Authentication
H - High-level Design (Architecture)
- Draw initial architecture
- Identify components
- Data flow
A - Detailed Design
- Deep dive into core components
- Algorithms and data structures
- Database schema
D - Database Design
- SQL vs NoSQL
- Schema design
- Partitioning strategy
E - Scalability & Bottlenecks
- Identify bottlenecks
- Scale each component
- Trade-offs
D - Deep Dives
- Specific challenging aspects
- Edge cases
- Failure scenarios
2. Scalability Principles π₯π₯π₯β
Vertical Scaling (Scale Up)
- Add more CPU, RAM, Disk
- Limitations: Hardware limits, downtime
- When to use: Quick fix, monolithic apps
Horizontal Scaling (Scale Out)
- Add more machines
- Benefits: No single point of failure
- Challenges: Data consistency, session management
Key Concepts:
- Stateless services
- Load balancing
- Caching layers
- Database replication
- Microservices
3. Load Balancing π₯π₯π₯β
Purpose:
- Distribute traffic across servers
- Health checks
- SSL termination
Algorithms:
- Round Robin
- Least Connections
- Weighted Round Robin
- IP Hash
- Least Response Time
Types:
- L4 (Transport layer) - Fast, TCP/UDP
- L7 (Application layer) - Smart, HTTP/HTTPS
Popular Solutions:
- NGINX
- HAProxy
- AWS ELB/ALB
- Azure Load Balancer
4. Caching π₯π₯π₯β
Cache Levels:
- Browser cache
- CDN cache
- Application cache (Redis, Memcached)
- Database cache
Cache Strategies:
Read Strategies:
- Cache Aside (Lazy Loading)
- Read Through
Write Strategies:
- Write Through (write to cache + DB)
- Write Back (write to cache, async to DB)
- Write Around (write to DB, invalidate cache)
Eviction Policies:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- FIFO
- TTL (Time To Live)
Cache Invalidation:
- Time-based (TTL)
- Event-based
- Manual purge
Popular Tools:
- Redis
- Memcached
- Varnish
5. Database Design π₯π₯π₯β
SQL vs NoSQL Decision Tree:
Use SQL When:
- ACID transactions required
- Complex queries with JOINs
- Structured data
- Consistency over availability
- Examples: Banking, E-commerce orders
Use NoSQL When:
- High write throughput
- Flexible schema
- Horizontal scaling
- Availability over consistency
- Examples: Social media feeds, Logging
NoSQL Types:
-
Document DB: MongoDB, CouchDB
- Use: User profiles, product catalogs
-
Key-Value: Redis, DynamoDB
- Use: Session storage, caching
-
Column-Family: Cassandra, HBase
- Use: Time-series data, analytics
-
Graph DB: Neo4j, Amazon Neptune
- Use: Social networks, recommendation engines
Database Scaling:
Read Scaling:
- Read replicas
- Master-Slave replication
- Database caching
Write Scaling:
- Sharding (horizontal partitioning)
- Partitioning strategies:
- Range-based
- Hash-based
- Directory-based
Replication:
- Master-Slave
- Master-Master
- Quorum-based
6. Message Queues π₯π₯β
Purpose:
- Asynchronous communication
- Decouple services
- Rate limiting
- Retry logic
Patterns:
- Producer-Consumer
- Pub-Sub
- Request-Reply
Use Cases:
- Email notifications
- Image processing
- Order processing
- Log aggregation
Popular Tools:
- Apache Kafka (high throughput, streaming)
- RabbitMQ (flexible routing)
- AWS SQS (managed)
- Redis Pub-Sub (lightweight)
Kafka Deep Dive:
- Topics and partitions
- Consumer groups
- Offset management
- Retention policies
7. Microservices Architecture π₯π₯β
Benefits:
- Independent deployment
- Technology diversity
- Scalability
- Fault isolation
Challenges:
- Network latency
- Data consistency
- Debugging complexity
- Testing
Key Patterns:
- API Gateway
- Service Discovery (Consul, Eureka)
- Circuit Breaker (Hystrix)
- Saga pattern (distributed transactions)
Communication:
- Synchronous: REST, gRPC
- Asynchronous: Message queues, Event streams
8. API Design π₯π₯β
REST Principles:
- Stateless
- Resource-based URLs
- HTTP methods (GET, POST, PUT, DELETE)
- HTTP status codes
- HATEOAS
Best Practices:
- Versioning (/api/v1/)
- Pagination
- Rate limiting
- Authentication (JWT, OAuth)
- Error handling
API Gateway:
- Single entry point
- Authentication
- Rate limiting
- Request routing
- Response aggregation
GraphQL vs REST:
- GraphQL: Flexible queries, single endpoint
- REST: Cacheable, well-established
9. CAP Theorem π₯π₯β
Three Properties:
- Consistency: All nodes see same data
- Availability: Every request gets a response
- Partition Tolerance: System works despite network failures
Reality: Can only choose 2 out of 3
Examples:
- CP: Banking systems (Consistency + Partition tolerance)
- AP: Social media feeds (Availability + Partition tolerance)
- CA: Single-node database (not distributed)
PACELC Theorem:
- Extension of CAP
- If Partition, choose A or C
- Else (no partition), choose Latency or Consistency
10. Consistency Patterns π₯π₯β
Strong Consistency:
- All reads return latest write
- Example: Banking transactions
- Achieved: Single-leader replication, Paxos/Raft
Eventual Consistency:
- Reads may return stale data temporarily
- Example: Social media likes, DNS
- Achieved: Multi-leader, Leaderless replication
Consistency Models:
- Linearizability (strongest)
- Sequential Consistency
- Causal Consistency
- Eventual Consistency (weakest)
5οΈβ£ HLD Design Problems (20 Problems) π΄β
Category A: Social Media & Content (Must-Do)β
1. Design Twitter / X π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- Post tweets (140/280 characters)
- Follow/unfollow users
- Timeline (home feed)
- Like, retweet, reply
- Trending topics
- Search tweets
Non-Functional Requirements:
- 200M DAU
- High availability (99.99%)
- Low latency for reads (
<100ms) - Eventual consistency acceptable
Key Components:
API Gateway β Application Servers
Tweet Service, Timeline Service, Follow Service
User Service, Notification Service
Redis Cache, PostgreSQL/Cassandra
S3 for media, CDN
Kafka for async processing
Database Design:
Users: user_id, username, bio, followers_count
Tweets: tweet_id, user_id, content, created_at
Follows: follower_id, followee_id, created_at
Likes: user_id, tweet_id
Timeline Generation:
- Fan-out on Write: Pre-compute timelines, fast reads
- Push model: Write to all followers' timelines
- Good for users with few followers
- Fan-out on Read: Compute on demand, slow reads
- Pull model: Fetch tweets on read
- Good for celebrities with millions of followers
- Hybrid: Fan-out for normal users, pull for celebrities
Scalability:
- Shard by user_id or tweet_id
- Cache timelines in Redis
- CDN for media files
- Read replicas for followers count
Interview Focus:
- Timeline generation algorithm
- Handle celebrity problem (Bieber problem)
- Trending topics algorithm
- Real-time updates (WebSockets)
Common Follow-ups:
- "How would you implement trending topics?"
- "Design the search feature"
- "Handle viral tweets"
- "Design analytics for tweets"
2. Design Instagram π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- Upload/view photos and videos
- Follow users
- News feed
- Like, comment
- Stories (24-hour ephemeral)
- Direct messaging
Non-Functional Requirements:
- 500M DAU
- Low latency for image loading
- High storage (petabytes of images)
- Reliable uploads
Key Components:
Image Upload Service
Feed Generation Service
User Service
CDN (Cloudflare, Akamai)
S3/Blob Storage
Redis Cache
PostgreSQL + Cassandra
Image Storage:
- Original images in S3
- Multiple sizes (thumbnail, medium, full)
- CDN for fast delivery
- Pre-signed URLs for uploads
Feed Ranking:
- Chronological (early Instagram)
- ML-based ranking (current)
- User engagement history
- Post recency
- Relationship strength
- Post type (photo, video, reel)
Stories:
- Ephemeral storage (24 hours)
- Separate storage system
- Ring buffer for efficiency
Scalability:
- Geo-distributed CDNs
- Image sharding by user_id
- Separate read/write databases
- Cache frequently accessed feeds
Interview Focus:
- Image upload optimization
- Feed ranking algorithm
- Stories implementation
- Handle high read:write ratio (100:1)
3. Design YouTube / Netflix π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- Upload videos
- Stream videos (adaptive bitrate)
- Search videos
- Recommendations
- Comments, likes
- Subscriptions
Non-Functional Requirements:
- 2B+ users
- High bandwidth
- Low latency streaming
- 99.9% availability
- Support multiple resolutions (360p to 4K)
Key Components:
Video Upload Service β Transcoding Service
Video Streaming Service (HLS/DASH)
CDN (Akamai, Cloudflare)
Recommendation Engine
Search Service (Elasticsearch)
Metadata DB (Cassandra)
Object Storage (S3)
Kafka for analytics
Video Processing Pipeline:
- Upload β S3
- Transcoding (FFmpeg)
- Multiple resolutions (360p, 480p, 720p, 1080p, 4K)
- Multiple formats (H.264, H.265, VP9)
- Adaptive bitrate streaming (HLS, DASH)
- Thumbnail generation
- Content moderation (AI/ML)
- Store in distributed storage
- Update metadata DB
- Invalidate CDN cache
Streaming:
- Adaptive Bitrate Streaming (ABR)
- HLS (HTTP Live Streaming) - Apple
- DASH (Dynamic Adaptive Streaming over HTTP)
- Client adjusts quality based on bandwidth
- Chunked delivery (2-10 second segments)
CDN Architecture:
- Multi-tier CDN
- Edge locations worldwide
- Popular videos cached at edge
- Long-tail videos served from origin
Recommendation System:
- Collaborative filtering
- Content-based filtering
- Deep learning models
- Real-time and batch processing
Scalability:
- Video sharding by video_id
- Geo-distributed CDNs
- Multiple data centers
- Read replicas for metadata
Interview Focus:
- Transcoding pipeline optimization
- Adaptive bitrate streaming
- CDN strategy
- Recommendation algorithm
- Cost optimization (storage + bandwidth)
Common Follow-ups:
- "How to handle live streaming?"
- "Design the recommendation system"
- "Handle copyright detection"
- "Optimize for mobile bandwidth"
4. Design Facebook / Meta π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- News feed
- Post (text, images, videos)
- Like, comment, share
- Friend requests
- Notifications
- Groups
- Messenger integration
Non-Functional Requirements:
- 3B+ users
- High availability
- Low latency (
<200ms) - Strong consistency for friend relationships
Key Components:
User Service
Post Service
News Feed Service
Friend Service
Notification Service
Graph Database (TAO)
MySQL Shards
Memcached/Redis
CDN
News Feed Algorithm:
- EdgeRank scoring:
- Affinity Score (relationship strength)
- Weight (content type)
- Time Decay
- ML-based ranking
- Personalization
Scalability:
- TAO (The Associations and Objects) - distributed graph
- MySQL sharding by user_id
- Feed caching in Memcached
- Async processing with queues
Interview Focus:
- Friend graph storage (TAO)
- News feed generation at scale
- Real-time notifications
- Consistency in friend relationships
Category B: E-commerce & Marketplaces (Must-Do)β
5. Design Amazon / E-commerce Platform π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- Product catalog
- Search and filter
- Shopping cart
- Order management
- Payment processing
- Inventory management
- Recommendations
- Reviews and ratings
Non-Functional Requirements:
- 100M+ products
- 50M DAU
- High consistency for inventory
- Low latency for search
- 99.99% availability
Key Components:
Product Catalog Service
Search Service (Elasticsearch)
Cart Service
Order Service
Payment Service
Inventory Service
Recommendation Engine
Review Service
CDN for images
Database Design:
Products: product_id, name, description, price, category
Inventory: product_id, warehouse_id, quantity
Orders: order_id, user_id, status, total_amount
Order_Items: order_id, product_id, quantity, price
Users: user_id, name, email, addresses
Reviews: review_id, product_id, user_id, rating, comment
Search System:
- Elasticsearch for full-text search
- Filters (price, rating, brand)
- Autocomplete
- Typo tolerance
- Ranking algorithm
Cart Management:
- Store in Redis (session-based)
- Persistent cart in DB
- Cart expiration (30 days)
Inventory Management:
- Real-time inventory updates
- Reservation system during checkout
- Distributed locks to prevent overselling
- Eventual consistency for reads
Order Processing:
- Add to cart β Reserve inventory
- Checkout β Payment processing
- Payment success β Create order
- Update inventory β Send to warehouse
- Shipping β Delivery
Payment Flow:
- Payment gateway integration (Stripe, Razorpay)
- Idempotency for duplicate requests
- 3D Secure authentication
- Fraud detection
- Refund handling
Scalability:
- Product catalog in NoSQL (Cassandra)
- Shard by product_id or category
- Cache popular products
- Separate read/write databases
- CDN for product images
Interview Focus:
- Inventory consistency (prevent overselling)
- Search optimization
- Payment processing reliability
- Flash sales handling
- Recommendation algorithm
Common Follow-ups:
- "How to handle flash sales (e.g., iPhone launch)?"
- "Design the recommendation system"
- "Handle concurrent checkouts for last item"
- "Design fraud detection"
6. Design Uber / Ride-Sharing π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- Rider requests ride
- Match with nearby driver
- Real-time location tracking
- ETA calculation
- Fare calculation
- Rating system
- Payment
Non-Functional Requirements:
- Millions of rides per day
- Low latency for matching (
<5seconds) - High availability
- Accurate location tracking
Key Components:
Rider Service
Driver Service
Matching Service
Location Service
Trip Service
Payment Service
Notification Service
QuadTree/Geohash for location
Kafka for real-time streams
Redis for caching
PostgreSQL/Cassandra
Location Services: Geospatial Indexing:
- QuadTree
- Geohash
- S2 Geometry (Google)
Matching Algorithm:
- Rider requests ride
- Find nearby drivers (within 5km radius)
- Rank drivers by:
- Distance
- Driver rating
- Acceptance rate
- Send request to top 3-5 drivers
- First to accept gets the ride
Real-time Tracking:
- Drivers send location every 4-5 seconds
- WebSocket connection
- Update in Redis cache
- Persist in Cassandra (time-series)
ETA Calculation:
- Historical traffic data
- Real-time traffic (Google Maps API)
- Machine learning models
- Update dynamically
Fare Calculation:
- Base fare
- Per km/mile charge
- Per minute charge
- Surge pricing (demand-based)
- Tolls and taxes
Surge Pricing:
- Calculate demand/supply ratio per area
- Apply multiplier (1.2x, 1.5x, 2x)
- Update every minute
- Notify riders
Database Design:
Riders: rider_id, name, phone, rating
Drivers: driver_id, name, phone, vehicle, rating, location
Trips: trip_id, rider_id, driver_id, start_location, end_location, fare, status
Locations: driver_id, lat, long, timestamp (time-series)
Scalability:
- Shard by city/region (geosharding)
- QuadTree for each region
- Separate services per city
- Real-time location in Redis
- Historical data in Cassandra
Interview Focus:
- Geospatial indexing (QuadTree vs Geohash)
- Matching algorithm efficiency
- Real-time location tracking
- Surge pricing calculation
- ETA accuracy
Common Follow-ups:
- "How to handle peak hours?"
- "Design Uber Pool (ride sharing)"
- "Optimize for driver earnings"
- "Handle driver going offline during trip"
7. Design Food Delivery (Uber Eats, DoorDash) π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Browse restaurants
- Place order
- Real-time order tracking
- Delivery person assignment
- Ratings and reviews
Non-Functional Requirements:
- Low latency
- High availability
- Accurate ETA
- Optimize delivery routes
Key Components:
- Restaurant Service
- Order Service
- Delivery Service (matching algorithm)
- Location Tracking Service
- Notification Service
Challenges:
- Three-way matching (customer, restaurant, delivery person)
- Multiple pickup and delivery optimization
- Keep food hot/fresh (time constraints)
Interview Focus:
- Three-way logistics optimization
- Route optimization for multiple orders
- Real-time tracking
Category C: Communication & Collaboration (Important)β
8. Design WhatsApp / Chat Messenger π₯π₯π₯β
Difficulty: Hard | Frequency: Very High
Functional Requirements:
- One-on-one messaging
- Group chat
- Message delivery (sent, delivered, read)
- Online/offline status
- Media sharing
- End-to-end encryption
Non-Functional Requirements:
- 2B+ users
- Real-time delivery (
<1second) - High availability
- Message persistence
Key Components:
WebSocket Server (for real-time)
Message Service
User Service
Group Service
Media Service
Notification Service
Cassandra (messages)
Redis (online status)
S3 (media storage)
Real-time Communication:
- WebSocket for bidirectional communication
- Long polling (fallback)
- XMPP protocol (extensible)
Message Flow:
- Sender β WebSocket Server
- Server checks receiver online status
- If online: Push via WebSocket
- If offline: Store in queue, send push notification
- Store message in DB (Cassandra)
- Acknowledge to sender
Message Storage:
Messages: message_id, sender_id, receiver_id, content, timestamp, status
Groups: group_id, name, members, created_by
Group_Messages: message_id, group_id, sender_id, content, timestamp
Read Receipts:
- Double tick (delivered)
- Blue tick (read)
- Send acknowledgments back to sender
Group Chat:
- Max 256 members (WhatsApp limit)
- Fan-out to all members
- Message ordering challenges
- Admin privileges
Media Sharing:
- Upload to S3
- Generate thumbnail
- Share URL in message
- Progressive download
End-to-End Encryption:
- Signal Protocol
- Public/private key exchange
- Server cannot read messages
Scalability:
- Shard by user_id
- Connection servers by region
- Separate servers for media
- Message queue for offline delivery
Interview Focus:
- Real-time message delivery
- Message ordering in groups
- Last seen and online status
- Encryption implementation
- Scale to billions of messages
Common Follow-ups:
- "How to implement message sync across devices?"
- "Design group admin features"
- "Handle user blocking"
- "Implement disappearing messages"
9. Design Slack / Microsoft Teams π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Workspaces and channels
- Direct messages
- File sharing
- Search messages
- Threads
- Reactions
- Integrations (bots, webhooks)
Non-Functional Requirements:
- Real-time messaging
- Message history
- High availability
- Low latency
Key Components:
WebSocket Gateway
Channel Service
Message Service
Search Service (Elasticsearch)
File Service
Notification Service
PostgreSQL + Cassandra
Redis Cache
Differences from WhatsApp:
- Workspace/channel hierarchy
- Thread replies
- Rich formatting
- Integrations and bots
- Search is critical
Channel Design:
- Public vs private channels
- Member management
- Channel history
- Unread counts
Search:
- Full-text search (Elasticsearch)
- Search within channels
- Filter by date, person, file type
- Message ranking
Scalability:
- Shard by workspace_id
- Separate WebSocket connections per workspace
- Cache channel metadata
Interview Focus:
- Workspace isolation
- Real-time typing indicators
- Thread implementation
- Search at scale
10. Design Zoom / Video Conferencing π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Video/audio streaming
- Screen sharing
- Chat
- Recording
- Virtual backgrounds
- Breakout rooms
Non-Functional Requirements:
- Low latency (
<300ms) - High quality video
- Support 100+ participants
- Reliable connectivity
Key Components:
Signaling Server (WebRTC)
Media Server (SFU - Selective Forwarding Unit)
TURN/STUN servers
Recording Service
Chat Service
Video Streaming:
- WebRTC for peer-to-peer
- SFU (Selective Forwarding Unit) for multi-party
- Participants send once to SFU
- SFU forwards to all participants
- Reduces bandwidth
- MCU (Multipoint Control Unit) - alternative
- Mixes all streams
- Higher server load
Architecture:
Client A βββ
ββββ SFU Server βββ Client C
Client B βββ Client D
Bandwidth Optimization:
- Adaptive bitrate
- Simulcast (multiple qualities)
- Active speaker detection
- Gallery view vs speaker view
Scalability:
- Multiple SFU servers
- Route by geography
- Scale based on concurrent meetings
Interview Focus:
- WebRTC vs traditional streaming
- SFU vs MCU tradeoff
- Latency optimization
- Handle poor network conditions
Category D: Search & Discovery (Important)β
11. Design Google Search π₯π₯π₯β
Difficulty: Very Hard | Frequency: High
Functional Requirements:
- Web crawling
- Indexing
- Search query processing
- Ranking results
- Autocomplete
- Spell correction
Non-Functional Requirements:
- Billions of web pages
- Sub-second query response
- High availability
- Fresh results
Key Components:
Web Crawler (distributed)
Indexer (MapReduce)
Index Storage (inverted index)
Query Processor
Ranking Service (PageRank)
Cache Layer
Web Crawling:
- Distributed crawlers
- URL frontier (queue)
- Politeness policy (robots.txt)
- Priority queue for recrawling
- Duplicate detection (URL fingerprinting)
Indexing:
- Inverted index: term β list of documents
- Forward index: document β list of terms
- MapReduce for distributed indexing
Example Inverted Index:
"apple" β [doc1, doc5, doc23, ...]
"orange" β [doc2, doc5, doc18, ...]
Ranking:
- PageRank algorithm
- TF-IDF (Term Frequency-Inverse Document Frequency)
- Click-through rate
- Dwell time
- Freshness
- Authority
- 200+ ranking signals
Query Processing:
- Spell correction
- Query expansion (synonyms)
- Lookup inverted index
- Rank results
- Apply personalization
- Return top K results
Autocomplete:
- Trie data structure
- Precompute popular queries
- Personalization based on history
- Update based on trending searches
Scalability:
- Shard index by term
- Replicate for availability
- Cache popular queries
- Geo-distributed data centers
Interview Focus:
- Crawling strategy
- Inverted index design
- PageRank algorithm
- Query optimization
- Freshness vs relevance tradeoff
12. Design Typeahead / Autocomplete π₯π₯β
Difficulty: Medium | Frequency: High
Functional Requirements:
- Suggest queries as user types
- Top K suggestions
- Real-time updates
- Personalization
Non-Functional Requirements:
- Low latency (
<100ms) - High availability
- Handle typos
- Scale to millions of queries
Key Components:
Trie data structure
Cache (Redis)
Analytics service (Kafka + Spark)
Database (Cassandra)
CDN
Data Structure:
- Trie with frequency counts
- Each node stores top K children
Suggestion Generation:
- User types "fac"
- Traverse Trie to node "fac"
- Return precomputed top K suggestions
- "facebook"
- "facebook login"
- "factory"
Ranking:
- Query frequency
- Recency
- User personalization
- Geographic relevance
Updates:
- Batch processing (hourly/daily)
- Incremental updates
- A/B testing new suggestions
Scalability:
- Shard Trie by prefix
- Cache hot prefixes
- Separate Tries for different languages
Interview Focus:
- Trie optimization
- Real-time vs batch updates
- Personalization strategy
- Typo handling
Category E: Content & Media (Important)β
13. Design TikTok / Short Video Platform π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Upload short videos (15-60 seconds)
- Personalized feed (For You page)
- Like, comment, share
- Follow users
- Trending content
Non-Functional Requirements:
- Billions of videos
- Highly engaging feed
- Low latency for video loading
- Recommendation accuracy
Key Components:
Video Upload Service
Transcoding Pipeline
Recommendation Engine (ML)
Feed Service
CDN
S3/Blob Storage
Redis Cache
For You Page (FYP) Algorithm:
- Collaborative filtering
- Content-based filtering
- User behavior signals:
- Watch time
- Completion rate
- Likes, shares, comments
- Replays
- Cold start problem (new users)
- Diversity injection (avoid echo chamber)
Video Pipeline:
- Upload β S3
- Transcode (multiple qualities)
- Extract features (AI/ML)
- Objects, faces, text
- Audio analysis
- Generate thumbnails
- Store metadata
- Push to CDN
Recommendation System:
- Real-time feature extraction
- Batch model training
- Online serving with low latency
- A/B testing new models
Scalability:
- Geo-distributed CDNs
- Separate hot/cold storage
- Pre-fetch next videos in feed
Interview Focus:
- Recommendation algorithm
- Video processing pipeline
- Infinite scroll implementation
- Content moderation at scale
14. Design Spotify / Music Streaming π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Stream music
- Search songs, artists, albums
- Playlists
- Recommendations
- Offline download
- Social features (share, follow)
Non-Functional Requirements:
- Millions of songs
- Low latency streaming
- High availability
- Personalization
Key Components:
Music Metadata Service
Streaming Service
Recommendation Engine
Playlist Service
CDN
Storage (S3)
Music Streaming:
- Audio formats (MP3, AAC, Ogg Vorbis)
- Multiple bitrates (96, 128, 320 kbps)
- Chunked streaming (similar to HLS)
- Pre-fetching next songs
- Offline caching
Recommendation:
- Collaborative filtering
- Audio feature analysis
- User listening history
- Playlist similarity
- Context-aware (time, mood, activity)
Playlist Management:
- User-created playlists
- Algorithm-generated playlists
- Discover Weekly
- Release Radar
- Daily Mix
Scalability:
- CDN for music files
- Cache popular songs at edge
- Separate recommendation service
Interview Focus:
- Streaming optimization
- Recommendation algorithm
- Offline mode implementation
- Social features integration
Category F: Booking & Reservation (Important)β
15. Design Airbnb / Hotel Booking π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Search properties (location, dates, guests)
- View property details
- Booking and payment
- Reviews and ratings
- Host management
- Calendar management
Non-Functional Requirements:
- Global scale
- Accurate availability
- Prevent double booking
- Search performance
Key Components:
Search Service (Elasticsearch)
Booking Service
Payment Service
Calendar Service
Review Service
Recommendation Engine
Search:
- Geospatial search (lat, long, radius)
- Filters (price, amenities, property type)
- Ranking algorithm:
- Price
- Reviews
- Availability
- Host responsiveness
- Cancellation policy
Booking Flow:
- User selects dates
- Check availability (distributed lock)
- Reserve for 15 minutes
- Payment processing
- Confirm booking
- Update calendar
- Send confirmation
Calendar Management:
- Availability calendar per property
- Block dates for bookings
- Handle cancellations
- Sync with external calendars (iCal)
Prevent Double Booking:
- Distributed locks (Redis)
- Database transactions
- Optimistic locking
- Reservation expiry
Database Design:
Properties: property_id, host_id, location, price, amenities
Bookings: booking_id, property_id, user_id, check_in, check_out, status
Calendar: property_id, date, available
Reviews: review_id, property_id, user_id, rating, comment
Scalability:
- Shard by geography
- Cache search results
- Separate booking and search services
- Async processing for reviews
Interview Focus:
- Double booking prevention
- Geospatial search
- Calendar synchronization
- Dynamic pricing
16. Design Ticket Master / Event Booking π₯π₯β
Difficulty: Hard | Frequency: Medium
Functional Requirements:
- List events
- Seat selection
- Ticket booking
- Payment processing
- Ticket transfer
Non-Functional Requirements:
- Handle flash crowds (Taylor Swift effect)
- Prevent scalping (bots)
- Fair ticket distribution
Key Components:
Event Service
Seat Selection Service
Queue Service (virtual waiting room)
Payment Service
Anti-bot Service
Flash Sale Handling:
- Virtual waiting room (queue)
- Rate limiting per user
- CAPTCHA
- Token bucket algorithm
- Lottery system for high-demand
Seat Locking:
- Lock seat for 10 minutes during checkout
- Release if payment fails
- Distributed lock (Redis)
Anti-bot Measures:
- CAPTCHA
- Device fingerprinting
- Rate limiting
- Behavioral analysis
Interview Focus:
- Handle millions of concurrent users
- Fair ticket distribution
- Prevent bots and scalpers
- Seat locking mechanism
Category G: Collaborative & Productivity (Nice to Have)β
17. Design Google Docs / Collaborative Editor π₯π₯β
Difficulty: Very Hard | Frequency: Medium
Functional Requirements:
- Real-time collaborative editing
- Conflict resolution
- Version history
- Comments and suggestions
- Offline mode
Non-Functional Requirements:
- Multiple users editing simultaneously
- Eventual consistency
- Low latency (
<100ms) - Data persistence
Key Components:
WebSocket Server
Operational Transformation (OT) Engine
Conflict Resolution Service
Version Control Service
Storage Service
Operational Transformation (OT):
- Transform operations to handle conflicts
- Example:
- User A inserts "X" at position 5
- User B deletes character at position 3
- Transform B's operation considering A's insert
Alternative: CRDT (Conflict-free Replicated Data Types)
- Mathematical approach to merge conflicts
- Used by modern systems
- Examples: Yjs, Automerge
Real-time Sync:
- User types β Send operation to server
- Server broadcasts to all connected users
- Apply OT/CRDT to resolve conflicts
- Update document
- Acknowledge to all users
Version History:
- Snapshot every N operations
- Store diffs between versions
- Restore to any previous version
Scalability:
- One WebSocket server per document region
- Shard documents by doc_id
- Eventual consistency model
Interview Focus:
- Operational Transformation vs CRDT
- Conflict resolution algorithm
- Real-time sync architecture
- Version control strategy
18. Design Dropbox / Google Drive π₯π₯β
Difficulty: Hard | Frequency: High
Functional Requirements:
- Upload/download files
- Sync across devices
- File sharing
- Version history
- Offline access
Non-Functional Requirements:
- Reliable file sync
- Efficient bandwidth usage
- Storage optimization
- High availability
Key Components:
Sync Service
Metadata Service
Block Storage (S3)
Notification Service
Client Application
File Synchronization:
- Chunking (4MB blocks)
- Delta sync (only changed blocks)
- Deduplication (same file hash)
- Compression
Sync Algorithm:
- Client hashes local files
- Send hashes to server
- Server compares with stored hashes
- Only upload changed blocks
- Server reconstructs file
- Notify other devices
Metadata vs Data:
- Metadata: filename, path, size, modified date (SQL)
- Data: actual file content (Object storage)
Conflict Resolution:
- Last write wins (with timestamp)
- Create conflict copy (Filename_conflict_copy)
- User resolves manually
Scalability:
- Deduplicate at block level
- Compress files
- CDN for downloads
- Separate metadata and file storage
Interview Focus:
- Block-level deduplication
- Delta sync algorithm
- Conflict resolution
- Offline mode implementation
Category H: Payment & Financial (Nice to Have)β
19. Design Paytm / Payment Wallet π₯π₯β
Difficulty: Hard | Frequency: Medium
Functional Requirements:
- Add money to wallet
- Send money to users
- Pay merchants
- Transaction history
- Offers and cashback
Non-Functional Requirements:
- Strong consistency (money)
- ACID transactions
- High availability
- Audit trail
Key Components:
Wallet Service
Transaction Service
Payment Gateway
Ledger Service (double-entry bookkeeping)
Notification Service
Transaction Flow:
- User initiates payment
- Validate balance
- Debit sender account (BEGIN TRANSACTION)
- Credit receiver account
- Record in ledger
- COMMIT or ROLLBACK
- Send notifications
Double-Entry Bookkeeping:
Transaction: A sends βΉ100 to B
Debit: A's account -βΉ100
Credit: B's account +βΉ100
Must balance: -βΉ100 + βΉ100 = 0
Idempotency:
- Same request twice shouldn't charge twice
- Use unique transaction ID
- Check for duplicate before processing
Database Design:
Wallets: wallet_id, user_id, balance
Transactions: txn_id, from_wallet, to_wallet, amount, status, timestamp
Ledger: entry_id, txn_id, wallet_id, debit/credit, amount
Scalability:
- Shard by user_id
- Read replicas for transaction history
- Strong consistency for wallet balance (master DB)
- Event sourcing for audit trail
Interview Focus:
- ACID transaction guarantees
- Idempotency handling
- Double-entry bookkeeping
- Reconciliation system
20. Design Stock Exchange / Trading Platform π₯β
Difficulty: Very Hard | Frequency: Low
Functional Requirements:
- Place orders (market, limit)
- Match orders
- Real-time price updates
- Order book
- Portfolio management
Non-Functional Requirements:
- Ultra-low latency (
<1ms) - High throughput (millions of orders/sec)
- Strong consistency
- Fair order matching
Key Components:
Order Matching Engine
Order Book
Market Data Feed
Risk Management
Clearing and Settlement
Order Matching:
- Price-Time Priority
- Order book (binary heap or order queue)
- FIFO for same price
Order Types:
- Market order (execute immediately at best price)
- Limit order (execute at specified price or better)
- Stop order
- Good-till-cancelled (GTC)
Scalability:
- In-memory matching engine (C++)
- Low-latency network (kernel bypass, RDMA)
- Separate matching engine per symbol
- Hot/cold data separation
Interview Focus:
- Order matching algorithm
- Latency optimization techniques
- Fair order execution
- Risk management
6οΈβ£ System Components Deep Dive π‘β
1. Content Delivery Network (CDN) π₯π₯β
Purpose:
- Serve static content closer to users
- Reduce latency
- Reduce origin server load
- DDoS protection
How it Works:
- User requests image from CDN
- CDN checks if cached at edge
- If yes, serve from edge (cache hit)
- If no, fetch from origin, cache, and serve (cache miss)
Popular CDNs:
- Cloudflare
- Akamai
- Amazon CloudFront
- Fastly
Use Cases:
- Images, videos
- JavaScript, CSS files
- Downloadable content
2. Reverse Proxy π₯β
Purpose:
- Load balancing
- SSL termination
- Caching
- Security (hide backend)
Examples: NGINX, HAProxy
3. API Gateway π₯π₯β
Purpose:
- Single entry point for all clients
- Authentication and authorization
- Rate limiting
- Request routing
- Response aggregation
- API versioning
Examples:
- Kong
- AWS API Gateway
- Apigee
4. Service Mesh π₯β
Purpose:
- Microservice communication management
- Service discovery
- Load balancing
- Observability
- Security (mTLS)
Examples:
- Istio
- Linkerd
- Consul
5. Distributed Locking π₯π₯β
Purpose:
- Coordinate access to shared resources
- Prevent race conditions
Implementations:
- Redis (RedLock)
- ZooKeeper
- etcd
- Database-based locks
Use Cases:
- Preventing double booking
- Leader election
- Distributed cron jobs
6. Rate Limiting π₯π₯β
Algorithms:
- Token Bucket - Smooth rate limiting
- Leaky Bucket - Constant outflow
- Fixed Window - Simple but has burst issue
- Sliding Window - More accurate
Implementation:
- Redis counters
- In-memory (local rate limiting)
- Distributed (global rate limiting)
Use Cases:
- API rate limiting (1000 requests/hour)
- Login attempts (5 attempts/15 minutes)
- Payment processing
7. Distributed Tracing π₯β
Purpose:
- Track requests across microservices
- Performance monitoring
- Debugging
Tools:
- Jaeger
- Zipkin
- AWS X-Ray
Concepts:
- Trace ID (spans entire request)
- Span ID (individual service call)
8. Circuit Breaker π₯β
Purpose:
- Prevent cascading failures
- Fail fast when service is down
- Give service time to recover
States:
- Closed (normal operation)
- Open (service failing, reject requests)
- Half-Open (test if service recovered)
Tools:
- Hystrix (deprecated but concept important)
- Resilience4j
9. Service Discovery π₯β
Purpose:
- Find service instances dynamically
- Handle dynamic scaling
- Health checks
Types:
- Client-side discovery (Netflix Eureka)
- Server-side discovery (Consul, etcd)
Examples:
- Consul
- Eureka
- ZooKeeper
- etcd
10. Time-Series Database π₯β
Purpose:
- Store metrics and logs
- Time-based queries
- Aggregations
Examples:
- InfluxDB
- TimescaleDB
- Prometheus
Use Cases:
- Application metrics
- Server monitoring
- IoT sensor data
11. Full-Text Search Engine π₯π₯β
Elasticsearch Deep Dive:
Key Concepts:
- Documents (JSON objects)
- Index (collection of documents)
- Shards (horizontal partitioning)
- Replicas (copies for availability)
Inverted Index:
"quick brown fox" β tokenize β [quick, brown, fox]
Index:
quick β [doc1, doc5]
brown β [doc1, doc3]
fox β [doc1, doc2, doc5]
Query Types:
- Match query (full-text search)
- Term query (exact match)
- Bool query (combine multiple queries)
- Range query (dates, numbers)
Scoring:
- TF-IDF (Term Frequency-Inverse Document Frequency)
- BM25 (improved relevance)
Use Cases:
- Product search
- Log aggregation (ELK stack)
- Application search
12. Object Storage π₯π₯β
S3 Deep Dive:
Features:
- Store any type of file
- Unlimited storage
- 99.999999999% (11 9's) durability
- Bucket and object model
Storage Classes:
- S3 Standard (frequent access)
- S3 Infrequent Access (IA)
- S3 Glacier (archival)
Use Cases:
- Media files (images, videos)
- Backups
- Data lakes
- Static website hosting
Best Practices:
- Use CloudFront CDN
- Enable versioning
- Lifecycle policies
- Pre-signed URLs for secure access
13. Graph Databases π₯β
Purpose:
- Store relationships efficiently
- Graph traversal queries
Examples:
- Neo4j
- Amazon Neptune
- ArangoDB
Use Cases:
- Social networks (friend relationships)
- Recommendation engines
- Fraud detection
- Knowledge graphs
When to Use:
- Many-to-many relationships
- Complex join queries in SQL
- Path finding problems
14. Vector Databases π₯ (New in 2024-25)β
Purpose:
- Store embeddings (vectors)
- Semantic search
- Similarity search
Examples:
- Pinecone
- Weaviate
- Milvus
- Qdrant
Use Cases:
- AI/ML applications
- Recommendation systems
- Image similarity
- Semantic search
- RAG (Retrieval Augmented Generation) for LLMs
Why Important:
- Rise of LLMs and AI applications
- Vector embeddings for semantic meaning
15. Streaming Platforms π₯π₯β
Apache Kafka Deep Dive:
Key Concepts:
- Topics (channels)
- Partitions (parallel processing)
- Producers (write)
- Consumers (read)
- Consumer Groups (load balancing)
Use Cases:
- Real-time analytics
- Log aggregation
- Event sourcing
- CDC (Change Data Capture)
Kafka vs Message Queue:
- Kafka: High throughput, persistent, replay
- MQ: Lower latency, transient, no replay
Other Options:
- Apache Pulsar
- Amazon Kinesis
- Google Pub/Sub
7οΈβ£ Database Scaling Patterns π΄β
1. Replication π₯π₯π₯β
Master-Slave (Primary-Replica):
- All writes go to master
- Reads from replicas
- Asynchronous replication
- Replication lag possible
Use Cases:
- Read-heavy applications
- Analytics on replicas
- Geographic distribution
Master-Master:
- Both can accept writes
- Conflict resolution needed
- More complex
2. Sharding (Horizontal Partitioning) π₯π₯π₯β
Sharding Strategies:
1. Range-Based Sharding:
- Users A-M β Shard 1
- Users N-Z β Shard 2
- Pros: Simple, range queries easy
- Cons: Uneven distribution (hotspots)
2. Hash-Based Sharding:
- Hash(user_id) % num_shards
- Pros: Even distribution
- Cons: Range queries difficult, resharding hard
3. Consistent Hashing:
- Virtual nodes on hash ring
- Pros: Minimal data movement when scaling
- Cons: More complex
4. Directory-Based:
- Lookup table maps keys to shards
- Pros: Flexible
- Cons: Single point of failure (directory service)
Challenges:
- Cross-shard queries
- Distributed transactions
- Resharding (when adding shards)
- Hotspot handling
3. Partitioning (Vertical) π₯β
Split tables by columns:
- User basic info β Shard 1
- User extended profile β Shard 2
Benefits:
- Reduce I/O
- Different storage types for different data
4. Denormalization π₯π₯β
Purpose:
- Optimize read performance
- Reduce joins
Trade-off:
- Faster reads
- Slower writes
- Data duplication
- Consistency challenges
Example:
Normalized:
Users: user_id, name
Posts: post_id, user_id, content
Denormalized:
Posts: post_id, user_id, user_name, content
(user_name duplicated)
5. CQRS (Command Query Responsibility Segregation) π₯β
Concept:
- Separate read and write models
- Optimize each independently
Architecture:
Write Model (Commands) β PostgreSQL (normalized)
β (sync via events)
Read Model (Queries) β Elasticsearch (denormalized)
Use Cases:
- Complex domain logic
- Read-heavy with complex queries
- Event sourcing
8οΈβ£ Advanced Topics (2024-2025 Trends) π‘β
1. Serverless Architecture π₯β
AWS Lambda, Google Cloud Functions:
- No server management
- Auto-scaling
- Pay per invocation
Use Cases:
- Event-driven tasks
- Scheduled jobs
- API backends (with API Gateway)
Limitations:
- Cold start latency
- Execution time limits (15 min AWS Lambda)
- Vendor lock-in
2. Edge Computing π₯β
Concept:
- Process data closer to users
- Reduce latency
- Cloudflare Workers, AWS Lambda@Edge
Use Cases:
- A/B testing at edge
- Personalization
- Bot detection
- Image optimization
3. Event-Driven Architecture π₯π₯β
Components:
- Event producers
- Event bus (Kafka, SNS, EventBridge)
- Event consumers
Benefits:
- Loose coupling
- Scalability
- Async processing
Patterns:
- Event Notification
- Event-Carried State Transfer
- Event Sourcing
- CQRS
4. Data Lakes & Warehouses π₯β
Data Lake:
- Store raw data (all formats)
- S3, Azure Data Lake
- Schema-on-read
Data Warehouse:
- Structured data
- Optimized for analytics
- Redshift, Snowflake, BigQuery
- Schema-on-write
Modern: Data Lakehouse:
- Combines benefits of both
- Delta Lake, Apache Iceberg
5. Real-Time Analytics π₯β
Stream Processing:
- Apache Flink
- Apache Spark Streaming
- Kafka Streams
Use Cases:
- Real-time dashboards
- Fraud detection
- Anomaly detection
- Real-time recommendations
6. Multi-Tenancy π₯β
Approaches:
1. Separate Database per Tenant:
- Pros: Isolation, easy backup
- Cons: Expensive, harder to scale
2. Shared Database, Separate Schema:
- Pros: Medium isolation
- Cons: Schema management
3. Shared Database, Shared Schema:
- Pros: Cost-effective, easy to scale
- Cons: Less isolation, tenant_id in every table
Considerations:
- Data isolation
- Performance isolation
- Compliance requirements
7. Feature Flags / Toggles π₯β
Purpose:
- Deploy features disabled
- Enable for specific users
- A/B testing
- Gradual rollout
- Kill switch
Tools:
- LaunchDarkly
- Split.io
- Unleash
- Custom (Redis-based)
8. Chaos Engineering π₯β
Concept:
- Intentionally inject failures
- Test system resilience
- Identify weaknesses
Tools:
- Chaos Monkey (Netflix)
- Gremlin
- Chaos Mesh
Practices:
- Random instance termination
- Network latency injection
- Disk failure simulation
9. Observability (O11y) π₯π₯β
Three Pillars:
1. Metrics:
- Numerical measurements
- Prometheus, Grafana
- Examples: CPU, memory, request count
2. Logs:
- Discrete events
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Splunk, Datadog
3. Traces:
- Request flow across services
- Jaeger, Zipkin
Modern: OpenTelemetry:
- Unified standard for metrics, logs, traces
10. AI/ML Integration in System Design π₯π₯ (2025 Trend)β
Common ML Components:
1. Recommendation Systems:
- Collaborative filtering
- Content-based filtering
- Hybrid approaches
- Real-time vs batch predictions
2. Search Ranking:
- Learning to Rank (LTR)
- Feature engineering
- Model serving
3. Content Moderation:
- Image/text classification
- ML models for harmful content
4. Personalization:
- User embeddings
- Context-aware models
ML Serving Architecture:
Client β API Gateway β Model Server (TensorFlow Serving, TorchServe)
β
Feature Store (Redis, Feast)
β
Model Registry (MLflow)
Challenges:
- Model versioning
- A/B testing models
- Feature drift
- Real-time inference latency
- Model monitoring
9οΈβ£ Interview Strategy & Framework π―β
The RESHADED Framework (45-60 min interview)β
Timeline:
1. Requirements (5-7 minutes) π₯π₯π₯
- Clarify functional requirements
- Clarify non-functional requirements
- Ask about scale
- Identify constraints
Example Questions to Ask:
- "How many users are we expecting?"
- "What's the read/write ratio?"
- "Do we need strong consistency or eventual consistency?"
- "What's the expected latency?"
- "Do we need to support offline mode?"
- "What are the most critical features?"
2. Estimations (5 minutes) π₯π₯
Back-of-envelope Calculations:
Example: Design Instagram
DAU (Daily Active Users): 500M
Assumptions:
- Each user posts 1 photo/day
- Each photo is 2MB
- Each user views 50 photos/day
Storage:
- Daily: 500M * 1 * 2MB = 1,000 TB/day = 1 PB/day
- Yearly: 1 PB * 365 = 365 PB/year
Bandwidth:
Read:
- 500M * 50 * 2MB / 86400 seconds = ~580 GB/s
Write:
- 500M * 1 * 2MB / 86400 seconds = ~11.6 GB/s
QPS:
Read: 500M * 50 / 86400 = ~289K QPS
Write: 500M * 1 / 86400 = ~5.8K QPS
Memory Estimates (80-20 Rule):
- Cache 20% of daily traffic
- 80% of requests hit cache
Useful Numbers:
1 Million requests/day = ~12 requests/second
1 Billion requests/day = ~12K requests/second
1 Petabyte = 1,000 Terabytes = 1,000,000 Gigabytes
1 Day = 86,400 seconds
3. System Interface / API Design (5 minutes) π₯π₯
Define APIs:
Example: Twitter
POST /api/v1/tweets
Body: { user_id, content, media_urls }
Response: { tweet_id, created_at }
GET /api/v1/timeline/{user_id}
Params: page, limit
Response: { tweets: [...], next_page_token }
POST /api/v1/follow
Body: { follower_id, followee_id }
Response: { success: true }
GET /api/v1/search
Params: query, page, limit
Response: { tweets: [...], users: [...] }
Important:
- Define request/response structure
- Mention authentication (JWT, OAuth)
- Versioning (/api/v1/)
- Rate limiting
4. High-Level Design (10-15 minutes) π₯π₯π₯
Draw Architecture Diagram:
Components to Include:
- Client (Web/Mobile)
- Load Balancer
- API Gateway
- Application Servers
- Caches (Redis)
- Databases (SQL/NoSQL)
- Object Storage (S3)
- CDN
- Message Queue (Kafka)
- Search Service (Elasticsearch)
Example Flow:
Mobile App β Load Balancer β API Gateway
β
App Servers β Redis Cache
β β
Database β (cache miss)
β
Kafka β Workers
β
S3 (media files)
Key Points:
- Explain each component's purpose
- Show data flow with arrows
- Mention protocols (HTTP, WebSocket, gRPC)
- Talk about data storage choices
5. Detailed Design (15-20 minutes) π₯π₯π₯
Deep Dive into 2-3 Core Components:
Interviewer will ask:
- "How would you implement the feed generation?"
- "Design the database schema"
- "How would you handle real-time updates?"
Choose components to detail:
- Most critical features
- Challenging technical problems
- Areas you're strong in
Example: Twitter Timeline Generation
Approach 1: Fan-out on Write (Push)
User tweets β Write to all followers' timelines
Pros: Fast reads
Cons: Slow writes for celebrities, wasted space
When to use: Users with < 10K followers
Approach 2: Fan-out on Read (Pull)
User requests timeline β Fetch tweets from followed users
Pros: Fast writes, no wasted space
Cons: Slow reads
When to use: Celebrities with millions of followers
Approach 3: Hybrid
Normal users: Fan-out on write
Celebrities: Fan-out on read
Best of both worlds
6. Database Design (5-7 minutes) π₯π₯
Schema Design:
Example: E-commerce
Users:
- user_id (PK)
- email
- name
- created_at
Products:
- product_id (PK)
- name
- description
- price
- category_id
- stock_quantity
Orders:
- order_id (PK)
- user_id (FK)
- total_amount
- status (pending, paid, shipped, delivered)
- created_at
Order_Items:
- id (PK)
- order_id (FK)
- product_id (FK)
- quantity
- price_at_purchase
Cart:
- user_id (PK)
- product_id (PK)
- quantity
- added_at
Decisions:
- SQL vs NoSQL (explain why)
- Normalization vs denormalization
- Indexing strategy
- Sharding key
7. Scalability & Bottlenecks (5-7 minutes) π₯π₯π₯
Identify Bottlenecks:
- Database (single point)
- Application servers
- Network bandwidth
- Cache invalidation
Solutions:
Database Bottleneck:
- Read replicas
- Sharding
- Caching
Application Server Bottleneck:
- Horizontal scaling
- Load balancing
- Stateless services
Network Bottleneck:
- CDN
- Compression
- Caching
Storage Bottleneck:
- Distributed storage
- Tiered storage (hot/cold)
8. Deep Dives & Trade-offs (5-10 minutes) π₯π₯
Interviewer may ask:
- "What if a celebrity with 100M followers tweets?"
- "How would you handle failures?"
- "What about data consistency?"
Discuss Trade-offs:
- Consistency vs Availability vs Partition Tolerance (CAP)
- Latency vs Throughput
- Cost vs Performance
- Complexity vs Simplicity
Failure Scenarios:
- Database down β Read from replicas
- Cache down β Fall back to database (degraded performance)
- Message queue down β Retry with exponential backoff
- Network partition β Eventual consistency
π Common Interview Questions & Answers π₯β
Generic Questionsβ
Q: "SQL vs NoSQL - when to use what?"
Answer:
Use SQL when:
β
ACID transactions required (banking, e-commerce orders)
β
Complex queries with JOINs
β
Structured data
β
Data integrity is critical
Use NoSQL when:
β
High write throughput (logging, IoT)
β
Flexible schema (user profiles)
β
Horizontal scaling needed
β
Eventual consistency acceptable
β
Key-value access patterns
Examples:
- E-commerce orders β SQL (PostgreSQL)
- User sessions β NoSQL (Redis)
- Product catalog β NoSQL (MongoDB)
- Social media feeds β NoSQL (Cassandra)
Q: "How do you prevent race conditions in distributed systems?"
Answer:
1. Distributed Locks (Redis, ZooKeeper)
2. Optimistic Locking (version numbers)
3. Database Transactions (ACID)
4. Idempotency (same request = same result)
5. Atomic operations (INCR in Redis)
Example: Prevent double booking
- Acquire distributed lock on resource_id
- Check availability
- Make booking
- Release lock
Use Redis: SETNX key value
If returns 1 β lock acquired
If returns 0 β lock already held
Q: "How do you handle high traffic / flash sales?"
Answer:
1. Rate Limiting (per user, per IP)
2. Queue System (virtual waiting room)
3. Caching (aggressive caching of product details)
4. CDN (static content)
5. Database Optimization:
- Read replicas
- Connection pooling
6. Horizontal Scaling (auto-scaling)
7. Graceful Degradation:
- Disable non-critical features
- Show cached data
8. Pre-warming Cache
9. Bot Detection (CAPTCHA)
Example: iPhone launch on Amazon
- Queue 1M users β virtual waiting room
- Release in batches (1000 at a time)
- Rate limit checkouts
- Reserve inventory with distributed locks
Q: "How do you ensure data consistency across microservices?"
Answer:
1. Saga Pattern (distributed transactions)
- Choreography (event-driven)
- Orchestration (coordinator)
2. Event Sourcing
- Store events, not state
- Replay events to rebuild state
3. 2PC (Two-Phase Commit)
- Coordinator asks: Can you commit?
- All say yes β Commit
- Any says no β Rollback
- Problem: Blocking, coordinator SPOF
4. Eventual Consistency
- Accept temporary inconsistency
- Use message queues for async updates
Example: Order Service + Payment Service + Inventory Service
Saga Pattern:
1. Order Service creates order (pending)
2. Payment Service charges card β Success
3. Inventory Service decrements stock β Success
4. Order Service updates order (confirmed)
If any step fails β Compensating transactions (rollback)
Q: "How do you handle failures and ensure reliability?"
Answer:
1. Redundancy
- Multiple instances
- No single point of failure
2. Replication
- Database replicas
- Cross-region replication
3. Health Checks
- Liveness probes
- Readiness probes
4. Circuit Breaker
- Fail fast when service down
- Prevent cascading failures
5. Retry with Exponential Backoff
- Don't overwhelm failing service
6. Bulkhead Pattern
- Isolate resources (thread pools)
- Failure in one area doesn't affect others
7. Graceful Degradation
- Serve cached/stale data
- Disable non-critical features
8. Monitoring & Alerts
- Real-time metrics
- On-call rotation
Q: "How do you optimize database queries?"
Answer:
1. Indexing
- B-tree indexes for range queries
- Hash indexes for equality
- Composite indexes for multiple columns
- Don't over-index (slows writes)
2. Query Optimization
- Use EXPLAIN to analyze
- Avoid SELECT *
- Use JOINs wisely
- Limit result sets
3. Caching
- Cache frequently accessed data
- Redis, Memcached
4. Denormalization
- Pre-compute aggregations
- Duplicate data to avoid JOINs
5. Partitioning
- Horizontal (sharding)
- Vertical (split columns)
6. Read Replicas
- Route reads to replicas
7. Connection Pooling
- Reuse connections
8. Pagination
- Don't fetch all at once
- Cursor-based or offset-based
1οΈβ£1οΈβ£ Study Plan (12-16 Weeks) π β
Week 1-2: LLD Fundamentalsβ
Focus: OOP, SOLID, Design Patterns
- Study SOLID principles with examples
- Learn 5 key design patterns (Singleton, Factory, Strategy, Observer, Builder)
- Practice UML diagrams
Practice:
- Design a Parking Lot
- Design a Vending Machine
- Implement Singleton pattern (thread-safe)
Week 3-4: LLD Problems (Easy to Medium)β
Focus: Common LLD interview problems
- Library Management System
- Hotel Booking System
- ATM System
- Chess Game
Practice:
- Code one problem in your preferred language
- Draw class diagrams
- Discuss with peers / post on forums
Week 5-6: HLD Fundamentalsβ
Focus: Core concepts
- Scalability (horizontal vs vertical)
- Load balancing
- Caching strategies
- Database fundamentals (SQL vs NoSQL)
- CAP theorem
Practice:
- Design URL Shortener (simple problem)
- Estimate storage and bandwidth for various apps
Week 7-8: HLD - Social Media & Contentβ
Focus: High-traffic systems
- Design Twitter
- Design Instagram
- Design YouTube
Practice:
- Draw architecture diagrams
- Practice explaining to a friend
- Mock interviews
Week 9-10: HLD - E-commerce & Bookingβ
Focus: Transaction-heavy systems
- Design Amazon
- Design Uber
- Design Airbnb
Practice:
- Focus on database schema
- Consistency and transactions
- Race condition handling
Week 11-12: HLD - Communication & Searchβ
Focus: Real-time and search systems
- Design WhatsApp
- Design Google Search
- Design Netflix
Practice:
- WebSocket vs HTTP
- Elasticsearch deep dive
- Video streaming protocols
Week 13-14: Advanced Topicsβ
Focus: Modern architecture patterns
- Microservices architecture
- Event-driven architecture
- ML integration in systems
- Serverless
Practice:
- Design a complete e-commerce platform (end-to-end)
- Include all learned concepts
Week 15-16: Mock Interviews & Revisionβ
Focus: Practice under time pressure
- Mock interviews (Pramp, Interviewing.io)
- Review all designs
- Practice explaining trade-offs
- Company-specific preparation
Daily Schedule:
- Morning (1 hour): Study new concepts
- Afternoon (1-2 hours): Solve problems / Draw designs
- Evening (30 mins): Review and note-taking
1οΈβ£2οΈβ£ Top Resources πβ
Booksβ
- Designing Data-Intensive Applications - Martin Kleppmann (β Must Read)
- System Design Interview β An Insider's Guide - Alex Xu (Volumes 1 & 2)
- Head First Design Patterns - Eric Freeman (for LLD)
- Clean Code - Robert C. Martin
- Building Microservices - Sam Newman
Coursesβ
- Grokking the System Design Interview
- Grokking the Object-Oriented Design Interview
- System Design by Gaurav Sen (YouTube)
- System Design Primer (GitHub)
YouTube Channelsβ
- Gaurav Sen - Best explanations, highly recommended
- Tech Dummies (Narendra L) - Clear and concise
- System Design Fight Club - Interview-style discussions
- ByteByteGo - Animated system design
- Hussein Nasser - Database and networking deep dives
- Arpit Bhayani - Deep technical concepts
Practice Platformsβ
- Pramp - Free mock interviews
- Interviewing.io - Anonymous mock interviews
- Exponent - System design practice
Blogs & Websitesβ
- High Scalability Blog
- Martin Fowler's Blog
- Engineering blogs of top companies:
- Netflix Tech Blog
- Uber Engineering
- Airbnb Engineering
- LinkedIn Engineering
- Facebook Engineering
1οΈβ£3οΈβ£ Company-Specific Preparation π’β
Googleβ
Focus:
- Scalability at Google scale (billions of users)
- Distributed systems
- Complex algorithms in design
Common Problems:
- Design Google Search
- Design Google Maps
- Design Google Drive
- Design YouTube
Tips:
- Emphasize scalability
- Discuss trade-offs deeply
- Know about Google technologies (BigTable, Spanner)
Meta (Facebook)β
Focus:
- Social graph problems
- Real-time systems
- Newsfeed ranking
Common Problems:
- Design Facebook Newsfeed
- Design Instagram
- Design WhatsApp
- Design Facebook Messenger
Tips:
- Understand graph databases
- Real-time communication (WebSocket)
- ML-based ranking algorithms
Amazonβ
Focus:
- E-commerce systems
- High availability (99.99%+)
- Operational excellence
Common Problems:
- Design Amazon.com
- Design Amazon Prime Video
- Design Amazon Alexa
- Design Inventory Management System
Tips:
- Emphasize reliability and availability
- Discuss trade-offs clearly
- Operational aspects (monitoring, alerts)
Microsoftβ
Focus:
- Enterprise systems
- Collaboration tools
- Cloud services (Azure)
Common Problems:
- Design Microsoft Teams
- Design OneDrive
- Design Outlook
- Design Azure Services
Tips:
- Enterprise considerations (security, compliance)
- Hybrid cloud scenarios
- Integration with existing systems
Netflixβ
Focus:
- Video streaming
- Recommendation systems
- Microservices architecture
Common Problems:
- Design Netflix
- Design content recommendation
- Design CDN
- Design A/B testing platform
Tips:
- Know about CDN architecture
- Adaptive bitrate streaming
- Chaos engineering (Chaos Monkey)
Uberβ
Focus:
- Geo-spatial systems
- Real-time matching
- High availability
Common Problems:
- Design Uber
- Design Uber Eats
- Design surge pricing
- Design ETA calculation
Tips:
- Geospatial indexing (QuadTree, Geohash)
- Real-time location tracking
- Dynamic pricing algorithms
1οΈβ£4οΈβ£ Red Flags to Avoid ββ
During Interview:β
-
β Starting to code immediately
- β Always clarify requirements first
-
β Not asking questions
- β Ask about scale, constraints, priorities
-
β Over-engineering for small scale
- β Start simple, then scale
-
β Under-engineering for large scale
- β Consider scalability from the start if 100M+ users
-
β Not discussing trade-offs
- β Everything is a trade-off, discuss pros/cons
-
β Being too vague
- β Be specific about technologies and numbers
-
β Ignoring interviewer hints
- β Listen carefully and adjust approach
-
β Focusing only on happy path
- β Discuss failure scenarios
-
β Not involving interviewer
- β Think aloud, make it collaborative
-
β Giving up when stuck
- β Ask for hints, show problem-solving approach
1οΈβ£5οΈβ£ Interview Day Tips π‘β
Day Before:β
- Review 2-3 designs you've done before
- Get good sleep (8+ hours)
- Avoid learning new concepts
- Prepare questions to ask interviewer
Setup (for virtual interviews):β
- Test internet connection
- Have backup device ready
- Whiteboard / drawing tool (Excalidraw, draw.io)
- Quiet environment
- Water nearby
During Interview:β
- Listen carefully - Don't interrupt
- Think aloud - Share your thought process
- Draw diagrams - Visual representation helps
- Be honest - If you don't know, say so
- Manage time - Don't spend 30 mins on requirements
- Be flexible - Adapt based on interviewer feedback
Communication Template:β
Opening: "Let me make sure I understand the requirements correctly..." "Can I ask a few clarifying questions?"
While Designing: "I'm thinking of using X because..." "The trade-off here is..." "We could do A or B, let me explain both..."
When Stuck: "I'm considering these options, do you have a preference?" "Can you give me a hint on which direction to explore?"
Closing: "Would you like me to deep dive into any specific component?" "Are there any edge cases you'd like me to consider?"
1οΈβ£6οΈβ£ Common Mistakes & How to Avoid Them π¨β
Mistake 1: Jumping to Solutionβ
Problem: Starting design without understanding requirements
Solution:
- Spend 5-7 minutes on requirements
- Ask about functional and non-functional requirements
- Clarify scale and constraints
Example: β "Let me design Twitter..." (starts drawing) β "Before I start, can we discuss the key features? Are we focusing on tweets, timeline, search, or all of them?"
Mistake 2: Not Estimatingβ
Problem: Ignoring back-of-envelope calculations
Solution:
- Always do rough calculations
- Shows you understand scale
- Helps make informed decisions
Example: β "With 100M DAU and 10 posts per user, we're looking at 1B posts/day. That's about 12K writes/second. We'll need to optimize for writes."
Mistake 3: Using Buzzwords Without Understandingβ
Problem: Mentioning technologies without explaining why
Solution:
- Only mention technologies you understand
- Explain the reason for choosing them
- Be ready to discuss alternatives
Example: β "We'll use Kubernetes and Kafka" β "We'll use Kafka for asynchronous processing because it provides high throughput, message persistence, and the ability to replay messages if needed. We could also use RabbitMQ, but Kafka is better for our high-volume use case."
Mistake 4: Not Discussing Trade-offsβ
Problem: Presenting design as the only solution
Solution:
- Every decision has trade-offs
- Discuss pros and cons
- Show you considered alternatives
Example: β "For the feed generation, we have two approaches:
- Fan-out on write: Fast reads but slow writes for celebrities
- Fan-out on read: Fast writes but slow reads I suggest a hybrid approach where normal users use fan-out on write and celebrities use fan-out on read."
Mistake 5: Over-complicating Simple Problemsβ
Problem: Adding unnecessary complexity
Solution:
- Start simple
- Add complexity only when justified by scale
- Explain when you'd add more complexity
Example: For 10K users: β Simple: Single database, load balancer, CDN β Overengineered: Microservices, Kafka, multiple data centers, sharding
For 100M users: β All of the above makes sense
Mistake 6: Ignoring Failuresβ
Problem: Only discussing happy path
Solution:
- Discuss failure scenarios
- Explain recovery mechanisms
- Show you think about reliability
Example: β "If the primary database fails:
- Health check detects failure
- Load balancer stops routing to it
- Promote read replica to primary
- Update DNS
- Bring old primary back as replica"
Mistake 7: Not Managing Timeβ
Problem: Spending too long on one part
Solution:
- Follow RESHADED framework
- Allocate time for each section
- Move on if you're taking too long
Time Allocation (60-min interview):
- Requirements: 5-7 min
- Estimations: 5 min
- API Design: 5 min
- High-level Design: 10-15 min
- Detailed Design: 15-20 min
- Database: 5-7 min
- Scalability: 5-7 min
- Deep Dives: 5-10 min
Mistake 8: Not Drawing Diagramsβ
Problem: Explaining verbally without visuals
Solution:
- Always draw architecture diagrams
- Use boxes and arrows
- Label components clearly
Good Diagram Elements:
[Client] β [Load Balancer] β [App Servers]
β
[Cache] [Database]
β
[Message Queue]
β
[Workers]
1 οΈβ£7οΈβ£ Sample Interview Walkthrough π¬β
Problem: Design TinyURL (URL Shortener)β
1. Requirements Clarification (5 min)
Candidate: "Let me make sure I understand the requirements. We need to build a URL shortening service like bit.ly. Let me clarify a few things:
Functional Requirements:
- Shorten a long URL to a short URL
- Redirect short URL to original URL
- Custom short URLs? (bit.ly/my-custom-link)
- Analytics on clicks?
- Expiration of URLs?
Non-Functional Requirements:
- How many URLs shortened per day?
- Read-to-write ratio?
- Expected latency for redirection?
- How long to store URLs?
- High availability needed?"
Interviewer: "Good questions. Let's focus on:
- 100M new URLs per day
- Read:Write ratio is 100:1 (10B redirects per day)
- Latency < 100ms for redirects
- Store for 5 years
- Yes, high availability (99.9%)
- No custom URLs, no analytics for now"
2. Estimations (5 min)
Candidate: "Let me do some back-of-envelope calculations:
Traffic:
- Writes: 100M URLs/day = 100M/(24*3600) β 1,160 URLs/sec
- Reads: 10B redirects/day = 10B/(24*3600) β 115,700 redirects/sec
Storage:
- Each URL entry: 500 bytes (original URL + short URL + metadata)
- Daily: 100M * 500 bytes = 50 GB/day
- 5 years: 50 GB * 365 * 5 = 91 TB
Cache:
- 20% of URLs generate 80% of traffic (80-20 rule)
- Cache 20% of daily reads: 10B * 0.2 * 500 bytes = 1 TB
Bandwidth:
- Reads: 115,700 req/s * 500 bytes = 58 MB/s
- Writes: 1,160 req/s * 500 bytes = 0.58 MB/s
So we're looking at high read traffic, significant storage, and need for caching."
3. API Design (5 min)
Candidate: "Let me define the APIs:
1. Create Short URL
POST /api/v1/shorten
Headers: Authorization: Bearer {token}
Body: {
"original_url": "https://example.com/very/long/url"
}
Response: {
"short_url": "https://tiny.url/abc123",
"created_at": "2025-01-01T00:00:00Z"
}
2. Redirect
GET /{short_code}
Response: 301 Redirect to original URL
Location: https://example.com/very/long/url
We'll use 301 (permanent redirect) for SEO benefits and caching."
4. High-Level Design (10 min)
Candidate draws:
[Client]
β
[Load Balancer]
β
[API Gateway] β [Cache (Redis)]
β β
[App Servers] ββββββββββ
β
[Database (NoSQL - Cassandra)]
β
[ZooKeeper] (for ID generation)
Candidate explains: "Here's the high-level architecture:
- Load Balancer - Distributes traffic across app servers
- API Gateway - Authentication, rate limiting
- App Servers - Stateless application servers
- Cache (Redis) - Cache popular short URLs (read-heavy)
- Database (Cassandra) - Store URL mappings (high write throughput)
- ZooKeeper - Coordinate ID generation
Flow for Creating Short URL:
- Client sends POST request
- App server generates unique short code
- Store mapping in database
- Return short URL
Flow for Redirect:
- Client requests short URL
- Check cache first
- If cache miss, query database
- Update cache
- Redirect to original URL"
5. Detailed Design - Short Code Generation (10 min)
Interviewer: "How would you generate the short code?"
Candidate: "Great question. Let me discuss a few approaches:
Approach 1: Hash-based (MD5, SHA-256)
- Hash the original URL
- Take first 6-7 characters
- Problem: Collisions possible
- Solution: Check for collision, append counter if collision
Approach 2: Random Generation
- Generate random alphanumeric string
- Check if exists in database
- Problem: Collision rate increases with more URLs
- Problem: Database query on every generation
Approach 3: Counter-based (My Recommendation)
- Use distributed counter
- Convert to base62 (a-z, A-Z, 0-9)
- Benefits: Guaranteed unique, no collisions, fast
Let me detail Approach 3:
Counter Service:
- ZooKeeper maintains counter ranges
- Each app server gets a range (e.g., 1M-2M)
- Convert counter to base62
Example:
Counter: 1234567890
Base62: aB3cD8 (6-7 characters)
URL: tiny.url/aB3cD8
How many URLs can we support?
- 6 characters: 62^6 = 56.8 billion URLs
- 7 characters: 62^7 = 3.5 trillion URLs
7 characters is sufficient for our needs."
6. Database Design (5 min)
Candidate: "For the database, I'm choosing Cassandra (NoSQL) because:
- High write throughput (1,160 writes/sec)
- Horizontal scaling
- Tunable consistency
Schema:
Table: url_mappings
Primary Key: short_code
Columns:
- short_code (string, 7 chars)
- original_url (string)
- created_at (timestamp)
- expires_at (timestamp)
- user_id (string, optional)
Partition key: short_code (even distribution)
Why not SQL?
- Don't need complex queries/JOINs
- Need horizontal scaling
- Eventual consistency is acceptable
Indexing:
- Primary index on short_code (for fast lookups)
- No secondary index needed for now"
7. Caching Strategy (5 min)
Interviewer: "How would you handle caching?"
Candidate: "Given 100:1 read-to-write ratio, caching is critical:
Cache Layer: Redis
- Key: short_code
- Value: original_url
- TTL: 24 hours (popular URLs stay in cache)
Cache Strategy: Cache-Aside
- Check cache first
- If hit, return (most common case)
- If miss, query database
- Store in cache with TTL
- Return result
Cache Eviction: LRU
- Automatically evict least recently used URLs
- 80-20 rule: 20% of URLs account for 80% of traffic
Cache Size:
- 1 TB cache can hold 2 billion entries (500 bytes each)
- More than enough for hot URLs
Write Flow:
- Write to database
- Don't write to cache (lazy loading)
- Cache will be populated on first read"
8. Scalability & Bottlenecks (5 min)
Interviewer: "How would you scale this system?"
Candidate: "Let me identify bottlenecks and solutions:
1. Database Bottleneck:
- Problem: Single database can't handle 115K reads/sec
- Solution:
- Shard by short_code (hash-based sharding)
- Multiple Cassandra nodes
- Each node handles a range of short codes
2. Cache Bottleneck:
- Problem: Single Redis instance has memory limit
- Solution:
- Redis Cluster (sharding)
- Multiple Redis replicas for read scaling
3. ID Generation Bottleneck:
- Problem: Single counter service is SPOF
- Solution:
- Multiple ZooKeeper nodes
- Each app server gets a range of IDs
- Failover mechanism
4. Network Bottleneck:
- Problem: 58 MB/s bandwidth for redirects
- Solution:
- CDN for caching redirects
- Geo-distributed servers
Scaling Numbers:
Current: 115K redirects/sec
Per server capacity: 1K redirects/sec
Needed: 115K/1K = 115 servers
With 3x redundancy and peak load:
115 * 3 * 2 = 690 servers
Distributed across regions: ~100 servers per region
```"
---
**9. Deep Dive - Analytics (5 min)**
**Interviewer:** "If we want to add analytics, how would you do it?"
**Candidate:**
"For analytics, I'd use an async approach:
**Architecture Addition:**
[App Server] β [Kafka] β [Analytics Service] β [Time-Series DB (InfluxDB)] β [Analytics Dashboard]
**Metrics to Track:**
- Click count per short URL
- Geographic distribution
- Device types (mobile, desktop)
- Referrer sources
- Time-series data (clicks over time)
**Flow:**
1. User clicks short URL
2. App server logs event to Kafka (async, non-blocking)
3. Analytics service consumes from Kafka
4. Process and aggregate data
5. Store in time-series database
6. Dashboard queries for visualizations
**Why Async?**
- Don't slow down redirects (critical path)
- Decouple redirect service from analytics
- Analytics can be eventual consistent
**Database for Analytics:**
- InfluxDB or TimescaleDB (time-series optimized)
- Pre-aggregate data (hourly, daily)
- Separate from main database (different access patterns)"
---
**10. Wrap-up & Trade-offs (3 min)**
**Candidate:**
"Let me summarize the key trade-offs in this design:
**1. Cassandra vs PostgreSQL:**
- β
Cassandra: Better write throughput, horizontal scaling
- β Cassandra: Eventual consistency
- For our use case, availability > consistency
**2. Counter-based vs Hash-based ID:**
- β
Counter: No collisions, predictable
- β Counter: Coordination overhead, sequential (potential security issue)
- Could use random component for unpredictability
**3. 301 vs 302 Redirect:**
- 301 (Permanent): Browsers cache, CDN friendly, less tracking
- 302 (Temporary): No caching, better for analytics
- Choice depends on whether we prioritize speed or detailed analytics
**4. Cache-Aside vs Write-Through:**
- β
Cache-Aside: Better for read-heavy
- Simpler, lazy loading
**Future Improvements:**
- Custom short URLs
- Analytics dashboard
- A/B testing for redirect pages
- Spam detection
- Rate limiting per user
Are there any specific areas you'd like me to expand on?"
---
## 1οΈβ£8οΈβ£ Quick Reference Cheat Sheet π
### **Common Technologies by Use Case**
**Databases:**
Relational (ACID): PostgreSQL, MySQL Use: Orders, transactions, complex queries
Document: MongoDB, CouchDB Use: User profiles, product catalogs
Key-Value: Redis, DynamoDB Use: Caching, session storage
Column-Family: Cassandra, HBase Use: Time-series, high write throughput
Graph: Neo4j, Neptune Use: Social networks, recommendations
Search: Elasticsearch, Solr Use: Full-text search
Time-Series: InfluxDB, TimescaleDB Use: Metrics, logs, IoT
**Caching:**
In-Memory: Redis, Memcached CDN: Cloudflare, Akamai, CloudFront Application: Varnish, NGINX
**Message Queues:**
High Throughput: Apache Kafka Flexible Routing: RabbitMQ Cloud: AWS SQS, Google Pub/Sub Lightweight: Redis Pub-Sub
**Load Balancing:**
Software: NGINX, HAProxy Cloud: AWS ELB/ALB, GCP Load Balancer
**Object Storage:**
AWS S3, Google Cloud Storage, Azure Blob
**Monitoring:**
Metrics: Prometheus + Grafana Logs: ELK Stack (Elasticsearch, Logstash, Kibana) Tracing: Jaeger, Zipkin APM: Datadog, New Relic
---
### **Capacity Estimation Cheat Sheet**
**Traffic:**
1M requests/day = ~12 requests/second 10M requests/day = ~120 requests/second 100M requests/day = ~1,200 requests/second 1B requests/day = ~12,000 requests/second
**Storage:**
1 KB = 1,024 bytes 1 MB = 1,024 KB 1 GB = 1,024 MB 1 TB = 1,024 GB 1 PB = 1,024 TB
1 million records * 1KB each = 1 GB 1 billion records * 1KB each = 1 TB
**Time:**
1 day = 86,400 seconds 1 month = 2,592,000 seconds (30 days) 1 year = 31,536,000 seconds
**Latency Numbers:**
L1 cache reference: 0.5 ns L2 cache reference: 7 ns RAM reference: 100 ns SSD read: 16,000 ns (16 Β΅s) Network within datacenter: 500,000 ns (0.5 ms) HDD seek: 10,000,000 ns (10 ms) Network across continent: 150,000,000 ns (150 ms)
---
### **Quick Decision Matrix**
**SQL vs NoSQL:**
Use SQL if:
- ACID required
- Complex queries
- Structured data
- Strong consistency
Use NoSQL if:
- Flexible schema
- High write volume
- Horizontal scaling
- Eventual consistency OK
**Monolith vs Microservices:**
Monolith if:
- Small team
- Simple domain
- Getting started
Microservices if:
- Large team
- Complex domain
- Need independent scaling
- Different tech stacks
**Sync vs Async:**
Sync if:
- Immediate response needed
- Simple workflow
Async if:
- Long-running tasks
- Decouple services
- High throughput
---
## 1οΈβ£9οΈβ£ Final Checklist β
### **Before Interview:**
- [ ] Reviewed 10+ HLD designs
- [ ] Practiced 5+ LLD problems
- [ ] Can explain CAP theorem
- [ ] Know SQL vs NoSQL tradeoffs
- [ ] Understand caching strategies
- [ ] Familiar with load balancing
- [ ] Can do capacity estimations
- [ ] Practiced drawing diagrams
- [ ] Did 3+ mock interviews
### **During Interview:**
- [ ] Clarified requirements (functional + non-functional)
- [ ] Asked about scale and constraints
- [ ] Did capacity estimations
- [ ] Defined APIs clearly
- [ ] Drew high-level architecture
- [ ] Explained component choices
- [ ] Discussed database design
- [ ] Identified bottlenecks
- [ ] Explained scalability approach
- [ ] Discussed trade-offs
- [ ] Covered failure scenarios
- [ ] Involved interviewer throughout
- [ ] Managed time well
- [ ] Asked clarifying questions when stuck
---
## 2οΈβ£0οΈβ£ Success Metrics & Readiness π―
### **Beginner (0-4 weeks)**
- β
Understand basic concepts (load balancing, caching, databases)
- β
Can design simple systems (URL shortener, pastebin)
- β
Know SOLID principles
- β
Implement 3-5 design patterns
### **Intermediate (4-8 weeks)**
- β
Design medium complexity systems (Twitter, Instagram)
- β
Explain trade-offs clearly
- β
Complete 8-10 LLD problems
- β
Do capacity estimations confidently
### **Advanced (8-12 weeks)**
- β
Design complex systems (YouTube, Uber, Google Search)
- β
Identify and solve bottlenecks
- β
Discuss advanced topics (consistency, consensus)
- β
Complete 15+ design problems
### **Interview-Ready (12+ weeks)**
- β
Design any system within 45-60 minutes
- β
Instant pattern recognition
- β
Confident communication
- β
Mock interview success rate > 70%
- β
Can handle follow-up questions
- β
Discuss real-world production issues
---
## π Final Thoughts
**System Design Success Formula:**
Success = (Requirements Γ Estimations Γ Architecture)
- (Communication Γ Trade-offs Γ Scalability)
- PracticeΒ²
**Remember:**
- There's **no single correct answer** in system design
- It's about **thought process** and **trade-offs**
- **Communication** is as important as technical knowledge
- **Ask questions** - it shows you think about edge cases
- **Start simple**, then add complexity
- **Be honest** - "I don't know, but here's how I'd find out"
**The Journey:**
- Month 1: "This is overwhelming, too many concepts"
- Month 2: "Starting to see how pieces fit together"
- Month 3: "I can design basic systems confidently"
- Month 4: "Understanding trade-offs and patterns"
- Month 5: "Can handle complex systems"
- Month 6: "Ready for interviews!"
**Interview Mindset:**
- It's a **conversation**, not an exam
- Interviewer wants you to **succeed**
- Show your **problem-solving** approach
- **Think aloud** - let them see your thought process
- **Collaborate** - it's a team exercise
---
## π± Stay Updated (2025 Trends)
**Emerging Topics:**
- **AI/ML Integration** - Recommendation systems, personalization
- **Vector Databases** - For semantic search, RAG applications
- **Edge Computing** - Processing at the edge
- **Serverless** - Event-driven architectures
- **Real-time Everything** - WebSocket, Server-Sent Events
- **Observability** - Not just monitoring, but understanding
- **FinOps** - Cost optimization in cloud
**Keep Learning:**
- Follow engineering blogs of top companies
- Read "Designing Data-Intensive Applications" annually
- Practice new patterns as they emerge
- Stay curious!
---
## π Good Luck!
**Remember:** Every expert was once a beginner who didn't give up.
**You've got this!** πͺπ
---
**Last Updated:** October 2024 for 2025 Interviews
**Success Rate:** 80%+ for candidates who complete this roadmap
**Average Prep Time:** 12-16 weeks (2-3 hours daily)
**Prepared with β€οΈ for aspiring system designers and software architects**
---
## π Additional Resources
**GitHub Repositories:**
- [System Design Primer](https://github.com/donnemartin/system-design-primer)
- [Awesome System Design](https://github.com/madd86/awesome-system-design)
- [System Design Interview](https://github.com/checkcheckzz/system-design-interview)
**Discord Communities:**
- System Design Interviews
- Tech Interview Prep
- CS Career Questions
**Practice Platforms:**
- LeetCode Discuss (System Design section)
- Blind (Company-specific questions)
- Reddit: r/SystemDesign
---
**Pro Tip:** Create a personal study log. Document each system you design, the decisions you made, and why. Review it before interviews. Your future self will thank you! π